The Bayesian Spam Filter with NCD

نویسندگان

  • Michal Prilepok
  • Jan Platos
  • Václav Snásel
  • Eyas El-Qawasmeh
چکیده

Undesired e-mail (spam) becomes a big problem nowadays not only for users, but also for Internet providers. One of the main obstacles for elimination of this problem is a complicated security issues. In particular, the very low rate of falsely detected e-mails in practice. We tried to eliminate this problem by a sufficient similarity check of checked e-mails with e-mails marked as unrequested or spam. This paper uses Bayesian algorithm with many variations. For comparison purposes, we used a normalized compression distance which helped us reduce the rate of false detection of individual mails.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach to Spam Mail Detection

The ever increasing menace of spam is bringing down productivity. More than 70% of the email messages are spam, and it has become a challenge to separate such messages from the legitimate ones. I have developed a spam identification engine which employs naive Bayesian classifier to identify spam. A new concept-based mining model that analyzes terms on the sentence, document is introduced. . The...

متن کامل

AN EVALUATION OF FILTERING TECHNIQUES IN A NAÏVE BAYESIAN ANTI-SPAM FILTER by

An efficient anti-spam filter that would block all unsolicited messages i.e. spam, without blocking any legitimate messages is a growing need. To address this problem, this report takes a statistically-based approach, employing a Bayesian anti-spam filter, because it is content-based and self-learning (adaptive) in nature. We train the filter, using a large corpus of legitimate messages and spa...

متن کامل

Introduction of Fingerprint Vector based Bayesian Method for Spam Filtering

With the development of the diversification of spam, it raises the difficulties and challenges to content-based spam filtering. To address this problem, this paper firstly introduced the statistical features of Email headers, and then proposed a method to use these features to improve Bayesian anti-spam filter. The selected Email-header features are presented as the fingerprint vectors, and the...

متن کامل

Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

We investigate the performance of two machine learning algorithms in the context of antispam filtering. The increasing volume of unsolicited bulk e-mail (spam) has generated a need for reliable anti-spam filters. Filters of this type have so far been based mostly on keyword patterns that are constructed by hand and perform poorly. The Naive Bayesian classifier has recently been suggested as an ...

متن کامل

An evaluation of Naive Bayesian anti-spam filtering

It has recently been argued that a Naive Bayesian classifier can be used to filter unsolicited bulk e-mail (“spam”). We conduct a thorough evaluation of this proposal on a corpus that we make publicly available, contributing towards standard benchmarks. At the same time we investigate the effect of attribute-set size, training-corpus size, lemmatization, and stop-lists on the filter’s performan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012